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We present a method to perform fully selfconsistent density-functional calculations, which scales 
linearly with the system size and which is well suited for very large systems. It uses strictly lo- 
calized pseudoatomic orbitals as basis functions. The sparse Hamiltonian and overlap matrices are 
calculated with an 0{N) effort. The long range selfconsistent potential and its matrix elements are 
computed in a real-space grid. The other matrix elements are directly calculated and tabulated 
as a function of the interatomic distances. The computation of the total energy and atomic forces 
is also done in 0{N) operations using truncated, Wannier-like localized functions to describe the 
occupied states, and a band-energy functional which is iteratively minimized with no orthogonality 
constraints. We illustrate the method with several examples, including carbon and silicon supercells 
with up to 1000 Si atoms and supercells of P-C3N4. We apply the method to solve the existing 
controversy about the faceting of large icosahedral fuUerenes by performing dynamical simulations 
on Ceo, C240, and C540. 



A large effort has been devoted in the last few years 
to develop approximate methods to solve the electronic 
structure of large systems with a computational cost pro- 
portional to its size.El Several approaches are now suf- 
ficiently accurate and robust to obtain reliable results 
for systems with thousands of atoms. So far, how- 
ever, most of these schemes have been useful only with 
simple Hamiltonians, like empirical tight-binding mod- 
els, which provide an ideal setting for order-iV calcu- 
lations. First-principles order-TV calculations have been 
performed mainly in the non-selfconsistent Harris func- 
tional versionEl of the local density approximation (LDA) 
for electroiu&j exchange and correlation (XC) using min- 
imal bases .u'u Linear scaling algorithms in fully selfcon- 
sistent LDA have also been tried,a but the results are far 
from the linear scaling regime, due to the relatively small 
number of inanagcable atoms in those simulations. Her- 
nandez et alB have successfully produced LDA results in 
large silicon systems using a real-space grid method. The 
computational requirements that this kind of approach 
demands are, however, extremely large, and calculations 
must be performed in massive computational platforms. 

We have developed a selfconsistent density-functional 
formulation with linear scaling, capable of producing re- 
sults for very large systems, whose computational de- 
mands are not overwhelmingly large, so that systems 
with many hundreds of atoms can be treated in mod- 
est computational platforms like workstations, and much 
larger systems can be treated in massive platforms. The 
method is based on the linear combination of atomic or- 
bitals (LCAO) approximation as basis of expansion of 
the electronic states. Non-orthogonal LCAO bases are 
very efficient, reducing the number of variables dramat- 
ically, compared to plane-wave (PW) or real-space-grid 
approaches, so that larger systems can be studied. Also, 
LCAO can provide up to extremely accurate bases, stay- 



ing always in the range of a few valence orbitals per 
atom.Q As a first step, in this work we use minimal basis 
sets of one s and three p orbitals per atom, the exten- 
sion to larger bases being perfectly possible within the 
present formulation. The choice of a basis obviously im- 
plies a possible error associated to its incompleteness. In 
the same way as for the error due to the linear scaling 
algorithm, the error in the basis can be reduced at the 
expense of an increase in computational effort. Its mag- 
nitude should be carefully checked, but also compared 
with other sources of error to ensure that an increase of 
the basis is really worthwhile. 

Our method uses standard LDA technique si for the 
valence electrons ^ the core electrons being replaced by 
pseudopotentials.B The basis orbitals used in this work 
are the s and q,pseudoatomic orbitals defined by Sankey 
and Niklewski,Ll in the context of non-selfconsistent Har- 
ris functional methods. These are slightly excited pseu- 
doatomic orbitals 4>fj.{r), obtained by solving the valence 
electron problem in the isolated atom, with the same 
pseudopotential and LDA approximations, and with the 
boundary condition that the atomic orbitals are strictly 
localized, vanishing outside a given radius re- This ra- 
dius cutoff is in principle orbital dependent, but we do not 
make explicit this dependence in the equations only for 
simplicity in the notation. The great advantage of these 
orbitals is that they give rise to sparse overlap and Hamil- 
tonian matrices (since matrix elements between distant 
orbitals exactly vanish exactly) and they display the same 
structure as in conventional tight-binding. The extent of 
the interactions and the sparseness of the matrices de- 
pend on the cutoff radius of each atom. These are not 
critical as long as the maxima of the atomic wave func- 
tions are well within re- For an analysis of the quality of 
pseudoatomic orbitals as a basis for solid state calcula- 
tions we refer the reader to Ref. O. 
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FIG. 1. Convergence of the total energy per carbon atom 
vs grid fineness (given by the cutoff Ecut of the plane waves 
that it can represent). The results of the present method are 
shown for a diamond supercell with 64 atoms (full circles) 
and for a C3 cluster (diamonds). Open circles show results of 
conventional plane-wave calculations for diamond. 

The LDA Hamiltonian matrix elements for a given par- 
ticle density are obtained using a combination of tech- 
niques, adopting the most convenient one for each term 
of the Hamiltonian. In a prior step, to avoid dealing 
with the long range of the pseudopotentials, we rewrite 
the Kohn-Sham Hamiltonian by adding and subtracting 
the Hartree potential created by the neutral-atom charge 
no(r), defined as 

no(r)^^nfA(r-R,) (1) 

i 

where i runs over the atoms in the system, and is the 
spherical atomic charge density of the atom i in its neu- 
tral, isolated state with p° electrons on each orbital (p^. 
If we define 5n(v) — n{r) —no{r) where n(r) is the actual 
charge density, the Hartree potential can be decomposed 
into two contributions and V^, created by Sn{r) and 
no{r) respectively. Using Eq. (|ip, can be expressed 
as a sum of atomic contributions. Also, the pseudopo- 
tential is decomposed into a short-range non-local term 
Vnl and a long-range local term Vl- Following Sankey 
and Niklewskijj we define the neutral atom potential of 
a given atom at as 

VNA(r - R.) - V^{r - R,) + / -^dr'. (2) 

J |r-r'| 

^NA is short ranged, since the core attraction and the 
electron Coulomb repulsion of the neutral atom charge 
cancel each other beyond re- The Kohn-Sham Hamilto- 
nian is finally obtained as 

if^' = 1^ + V [^NL(r - R.) + VkA(r - R.)] 

i 

+ T/H'(r) + ^5cc(r) (3) 

The overlap, kinetic energy term, neutral atom poten- 
tial and non-local part of the pseudopotential, are all in- 



dependent of the charge density n(r), and their matrix el- 
ements between atomic orbitals can be expressed as sums 
of two center [S^v={4'ti\'t'u) and {(j)^\p'^ /2m\(j)v)) or three 
center (((/ip|X4^L(r - Ri)|'/>i.) and (0^|VNA(r - Ri)| 
integrals, which only depend on the relative positions of 
pairs or triplets of atoms. We^ follow the method pro- 
posed by Sankey and Niklewskicl to compute all these in- 
tegrals: they are calculated beforehand and tabulated as 
a function of the relative position of the centers. These 
tables are used during the simulation, to calculate all 
the non-zero integrals by interpolation. The details of 
the procedure can be found in Ref. ^ Since all these 
integrals are zero for distant enough atoms, their num- 
ber scales linearly with the size of the system, as well as 
the computation time. The contributions of these terms 
to the Hamiltonian are computed only once for a given 
atomic configuration, since they do not depend on the 
selfconsistent charge. 

The matrix elements of the Hartree potential V[f(r) 
created by the charge (5n(r) and the exchange-correlation 
potential T^cc [«(r)] both depend on the selfconsistent 
charge. To calculate these integrals we compute no(r), 
n(r) and (5n(r) for a given LCAO density matrix at 
the points of a regular grid in real space. This is 
straightforward since the basis orbitals are defined in real 
space. Poisson's equation for the Hartree potential can 
be then solved by the standard fast Fourier transform 
(FFT) method, assiiming a supercell geometry, or by the 
multigrid method.EJ In spite of its iVlogA^ scaling, we 
presently use FFT's for simplicity, since this part rep- 
resents a minor contribution to the total computational 
load. Note that only two FFT's are necessary per cy- 
cle of selfconsistency (SCF cycle), in contrast with PW 
based calculations, where an FFT is required for each 
electronic state. The LDA XC potential is trivially com- 
puted on each point of the grid. Once the value of the 
Hartree and the XC potentials are known at every point, 
the integrals (0p|V^|0i,) and (0p|Vxc|0i/) are computed 
by direct summation on the grid. These sums are care- 
fully done to minimize the amount of numerical workload 
involved. Only the non-zero integrals (between orbitals 
on atoms closer than 2rc) are computed, and only the 
points of the grid for which both orbitals are non-zero 
contribute to each integral. We use sparse-matrix mul- 
tiplication techniques optimized for this class of opera- 
tions. As a result, the computation of the integrals scales 
linearly with the size of the system. 

It is important to stress that the convergence with grid 
spacing of our method is different from that in standard 
PW calculations, which are known to require large PW 
cutoffs for systems containing atoms with hard pseudopo- 
tentials. In Figure |]we show the convergence of the to- 
tal energy per atom (referred to the converged value) for 
carbon, as a function of Ecut, the kinetic energy cutoff of 
the plane waves that the grid can represent. Full circles 
are for a diamond supercell of 64 atoms, whereas dia- 
monds are for a cluster of 3 carbon atoms in a supercell 
of 15x15x15 A3. In both cases, the results are converged 
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to below 2 meV/atom for a cutoff of 30 Ry. This is in 
sharp contrast with results of PW calculationsE3 (open 
circles) in which the cutoff necessary to achieve conver- 
gence (with the same pseudopotcntial) is much higher. 
Note, moreover, that the energy cutoff in our case refers 
to the grid representation of the charge density, whereas 
in the PW case it refers to the wave functions, which im- 
plies an even higher (four times) cutoff in the charge den- 
sity. The reason for the fast convergence of our approach 
is that most of the Hamiltonian terms (most importantly 
the kinetic energy and the neutral atom potential) are not 
computed in the grid. 

Once the Kohn-Sham Hamiltonian has beerucibtained, 
we use a recently proposed order- methodll3a to com- 
pute the band structure energy Ebs (sum of occupied 
eigenvalues} __In this approach, a modified band energy 
functionaLJO is minimizedll3 with respect to the elec- 
tronic orbitals by means of a conjugate gradients (CG) 
algorithm, to yield -Ebs- The orthonormality of the occu- 
pied states does not need to be imposed, but it is obtained 
as a result of the minimization of the energy functional. 
The elimination of the orthogonalization is the first step 
to achieve an order-A^ scaling. The second is the use of 
localized, Wannier-like wave functions (LWF) to describe 
the electronic states entering the minimization of the en- 
ergy functional. Truncation of these localized functions 
beyond a given cutoff Rc from the center of the LWF 
provides a linear scaling algorithm. The errors involved 
in this truncation, which can be reduced arbitrarily by 
increasing the value of Rc, are analyzed in detail in Ref |^. 

After the band energy has been minimized and the 
LWF's obtained, the new charge density is computed, 
completing a so-called SCF cycle. From the density, a 
new Hamiltonian is produced, the procedure being re- 
peated until selfconsistency in the charge density or the 
Hamiltonian is achieved. At this point, the total energy 
can be computed as 



E 



J Vn{r)n{r)dr + ^ J Vi{r)n,{r)dr 
+ I [exc{n)-Vxc{n)]n{r)dr + Un-cc 



(4) 



where Vh (r) is the Hartree potential of the selfconsi&tent 
charge n{r), and, following Sankey and Niklewski,Ll we 
have defined 



C/ii- 



2 V |R. - R. 



V^{r)noir) (5) 



As in the case of the Hamiltonian, we have added and 
subtracted the electrostatic energy of the neutral atom 
charge no{r) to obtain Eq. (^). The advantage, again, is 
that ?7ii-cc can be expressed as a sum of short range coru 
tributions, which is easy to evaluate in 0{N) operations,B 
avoiding the problems related with the long range charac- 
ter of the ionic core interactions. The integrals appearing 
in Eq. (^ are calculated in the real space grid. 
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FIG. 2. CPU time per SCF cycle and job memory for a sim- 
ulation of Si supercells with different sizes. Times measured 
in an IBM PowerPC with 17 Mflops (Linpack 100x100). 

In molecular dynamics (MD) simulations and geomet- 
rical optimizations the atomic forces are needed. We 
compute them using a variation of the Hellman-Feynman 
theorem, which includes Pulay-like corrections to account 
for the fact that the basis set is not complete and moves 
with the atoms. The force on atom i is 



aR. 



9R, 



K + Vxc\M (6) 



where H° = /2m -\- Vnl + V-^a, and p^n and E^i, arg 
the density and energy-density matrices, respectively.a 
The first pthree terms are calculated interpolating the ta- 
ble data,cl whereas the last two terms are computed by 
numerical integration in the grid, as was done for the 
matrix elements of the Hartree and XC potentials in the 
Kohn-Sham Hamiltonian. 

In order to show the linear scaling of the method, we 
have performed calculations on supercells of silicon in 
the diamond structure, with different numbers of atoms 
from 64 to 1000. Only the F point was used to sam- 
ple the Brillouin zone, the cutoff for the charge density 
grid was 12 Ry, and the LWF's were truncated at 4.5 A. 
Figure || shows the linear behavior of the CPU time and 
memory requirements with the number of atoms. The 
CPU time shown represents the average cost to perform 
a SCF step in a MD simulation, including the calcula- 
tion of the charge density and Hamiltonian matrix el- 
ements, the minimization of the band structure energy, 
and the calculation of the atomic forces. The band struc- 
ture energy minimization within each SCF cycle required 
an average of 20 CG iterations, while the number of SCF 
cycles depends largely on the simulation temperature, 
length of the time step and mixing algorithm for selfcon- 
sistency. So far, in comparable simulation conditions, no 
significant dependence of the number of CG iterations 
and SCF cycles with the size of the system has been ob- 
served. As we can see, in the present method both the 
CPU time and memory requirements are small enough to 
permit the calculation of a system of 1000 silicon atoms 
in a very modest workstation. 
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TABLE I. Average radius (f), standard (as) and maximum 
deviation [am = {rmax — rmin)/'2] of radii, and non-planarity 
angle^ </> around pentagons (going from 0° for a planar pentag- 
onal site to 12° for a truncated icosahedron) for the fuUerene 
clusters. We compare the results of the present work with 
those of Itoh et al.,^^ obtained with the Harris functional 
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As an example of a system with partially ionic char- 
acter, and with atoms with compact orbitals, we have 
performed calculations on the f3 phase of C3N4, which 
was proposed, as a potentially very hard material by Liu 
and Cohen.llII The calculations were done in supercells 
of 42 and 224 atoms, with nearly identical results. A 
cutoff of 200 Ry for the charge density grid was used. 
We obtain an accuracy better than 1% in both the lat- 
tice constants and the several inequivalent bond lengths, 
and 10% in Xhe bulk modulus, compared to other LDA 
calculations .0 These results contrast with those of the 
non-selfconsistent Harris functional, which yield errors of 
5% and 16% for the distances and bulk modulus, respec- 
tively, showing that selfconsistency is essential to obtain 
reliable results in this partially polar system. 

We have applied our method to study the struc- 
ture of large, single shell icosahedral fuUerene clusters. 
These are important to understand the observed spheric- 
ity of multishell fuUerenes (buckyonions). For the sin- 
gle shell clusters, elasticity theory, as well as empiri- 
cal potential calculations, predict markedlyj^olyhedral 
shapes. Calculations performed by Ltoh et aij^S using the 
Harris-functional order-A^ method,El agree qualitatively 
with these liesults. However, similar non-selfconsistent 
calculationsH predict that even the large clusters are 
spherical. Here we have repeated the calculations with 
selfconsistent LDA using the present method, thus im- 
proving over the non-selfconsistent nature of the former 
calculations. Using a dynamical quenching algorithm, we 
have computed the equilibrium structure of three icosa- 
hedral fuUerene clusters: Ceo, C240 and C540. A supercell 
geometry was assumed, with a cubic cell with sides of 12 
A for Ceo, 22 A for C240 and 34 A for C54o- The calcu- 
lations were done using a cutoff of 100 Ry for the repre- 
sentation of the charge density in reciprocal space, and 
a different localization radius for a ancijr type Wannier 
functions (2.5 and 4.0 A, respectively) Jl3 Increasing the 
localization radius to 4.1 A (both for a and tt), and/or 
increasing the grid cutoff to 150 Ry in the simulations 
changes the final relative distances in less than 0.4%. The 
results are summarized in Table | We see that our re- 
sults are very similar to those obtained by Itoh et at, 
and confirm that, except for Ceo, the single shell clusters 
tend to be polyhedral, instead of spherical, and that this 
polyhedral character is more pronounced as the cluster 
size increases. 



In conclusion, we have presented an efficient method 
for selfconsistent LDA calculations with linear scaling. 
We have analyzed the performance versus system size and 
grid cutoff, and shown that simulations of systems with 
hundreds of atoms are possible with small workstations. 
This should open the possibility of very large scale ab 
initio simulations in the near future. 
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